8-Bit Approximations for Parallelism in Deep Learning
نویسنده
چکیده
The creation of practical deep learning data-products often requires the parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communication bandwidth make it difficult to attain good speedups through parallelism. Here we develop and test 8-bit approximation algorithms make better use of the available bandwidth by compressing 32-bit gradients and nonlinear activations to 8-bit approximations. We show that these approximations do not decrease predictive performance on MNIST, CIFAR10, and ImageNet for both model and data parallelism and provide a data transfer speedup of 2x relative to 32-bit parallelism. We build a predictive model for speedups based on our experimental data, verify its validity on known speedup data, and show that we can obtain a speedup of 50x and more on a system of 96 GPUs compared to a speedup of 23x for 32-bit. We compare our data types with other methods and show that 8-bit approximations achieve state-of-the-art speedups for model parallelism. Thus 8-bit approximation is an efficient method to parallelize convolutional networks on very large systems of GPUs.
منابع مشابه
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs
We show empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize SGD through data-parallelism with fast processors like recent GPUs. We implement data-par...
متن کاملDeep Learning and Its Parallelization: Concepts and Instances
......................................................................................................................... 3 1 Introduction ............................................................................................................. 3 1.1 Application Background ............................................................................... 4 1.2 Performance Demands for Deep L...
متن کاملPorosity classification from thin sections using image analysis and neural networks including shallow and deep learning in Jahrum formation
The porosity within a reservoir rock is a basic parameter for the reservoir characterization. The present paper introduces two intelligent models for identification of the porosity types using image analysis. For this aim, firstly, thirteen geometrical parameters of pores of each image were extracted using the image analysis techniques. The extracted features and their corresponding pore types ...
متن کاملThe Application of Least Square Support Vector Machine as a Mathematical Algorithm for Diagnosing Drilling Effectivity in Shaly Formations
The problem of slow drilling in deep shale formations occurs worldwide causing significant expenses to the oil industry. Bit balling which is widely considered as the main cause of poor bit performance in shales, especially deep shales, is being drilled with water-based mud. Therefore, efforts have been made to develop a model to diagnose drilling effectivity. Hence, we arrived at graphical cor...
متن کاملBinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
We introduce BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient. We show that it is possible to train a Multi Layer Perceptron (MLP) on MNIST and ConvNets on CIFAR-10 and SVHN with BinaryNet and achieve nearly state-of-the-art results. At run-time, BinaryNet drastically reduces memory usage and replaces most multiplications by 1-bit exc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1511.04561 شماره
صفحات -
تاریخ انتشار 2015